Towards High Performance Phonotactic Feature for Spoken Language Recognition
نویسنده
چکیده
With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language translation, multilingual speech recognition and spoken document retrieval. Both humans and machines rely on certain informative cues to differentiate one language from another. Inspired by the findings in the discriminative cues for human language recognition, most of the automatic language recognition systems rely on the following three features: acoustic, prosodic and phonotactic. Acoustic features capture spectral characteristics and can be obtained from short-term speech signals. Prosodic features such as tone, intonation, prominence and rhythm can be derived from energy measurements, pitch contour, rate of change. Phonotactic features capture the statistics of lexical constraints and phonotactic patterns. Phonotactic features can be generated from a tokenization front end which converts speech signals into sequences of sound patterns. This thesis focuses on the study of effective phonotactic feature extraction methods for high performance automatic language recognition. Specifically, the main contributions of this thesis are: A novel target-oriented method is proposed to construct parallel phone recognizers for robust phonotactic feature extraction. A subset of the most discriminative phones from an existing phone recognizer is selected to form a target-oriented phone tokenizer (TOPT). The TOPT phone tokenizers, one for each of the target languages, are constructed from an existing phone recognizer without requiring additional transcribed training data.
منابع مشابه
Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction
Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and differen...
متن کاملPCA-based Feature Extraction for Phonotactic Language Recognition
Phonotactic language recognition is one of major techniques used for automatic recognition of spoken languages. We propose a feature extraction technique based on PCA to be used with SVM-based systems. This technique improves speed of the training, in some cases more than 1000 times, allowing systems to be effectively trained on much larger data sets. Speed-up of the test phase can be even grea...
متن کاملDimensionality Reduction for Using High-Order n-Grams in SVM-Based Phonotactic Language Recognition
SVM-based phonotactic language recognition is state-of-the-art technology. However, due to computational bounds, phonotactic information is usually limited to low-order phone n-grams (up to n = 3). In a previous work, we proposed a feature selection algorithm, based on n-gram frequencies, which allowed us work successfully with high-order n-grams on the NIST 2007 LRE database. In this work, we ...
متن کاملSelecting phonotactic features for language recognition
This paper studies feature selection in phonotactic language recognition. The phonotactic feature is presented by n-gram statistics derived from one or more phone recognizers in the form of high dimensional feature vectors. Two feature selection strategies are proposed to select the n-gram statistics for reducing the dimension of feature vectors, so that higher order n-gram features can be adop...
متن کاملImplicit Processing of Phonotactic Cues: Evidence from Electrophysiological and Vascular Responses
Spoken word recognition is achieved via competition between activated lexical candidates that match the incoming speech input. The competition is modulated by prelexical cues that are important for segmenting the auditory speech stream into linguistic units. One such prelexical cue that listeners rely on in spoken word recognition is phonotactics. Phonotactics defines possible combinations of p...
متن کامل